CHAPTER 20 Getting the Hint from Epidemiologic Inference 293
Avoiding overloading
You may think that choosing what covariates belong in a regression model is easy.
You just put all the confounders and the exposure in as covariates and you’re done,
right? Well, unfortunately, it’s not that simple. Each time you add a covariate to a
regression model, you increase the amount of error in the model by some amount —
no matter what covariate you choose to add. Although there is no official maximum
to the number of covariates in a model, it is possible to add so many covariates that
the software cannot compute the model, causing an error. In a logistic regression
model as discussed in Chapter 18, each time you add a covariate, you increase the
overall likelihood of the model. In Chapter 17, which focuses on ordinary least-
squares regression, adding a covariate increases your sum of squares.
What this means is that you don’t want to add covariates to your model that just
increase error and don’t help with the overall goal of model fit. A good strategy is
to try to find the best collection of covariates that together deal with as much error
as possible. For example, think of it like roommates who share apartment-
cleaning duties. It’s best if they split up the apartment and each clean different
parts of it, rather than insisting on cleaning up the same rooms, which would be
a waste of time. The term parsimony refers to trying to include the fewest
covariates in your regression model that explain the most variation in the depen-
dent variable. The modeling approaches discussed in the next section explain
ways to develop such parsimonious models.
FIGURE 20-1:
Example of how
confounders are
associated with
exposure and
outcome but are
not on the causal
pathway between
exposure and
outcome.
© John Wiley & Sons, Inc.